Customizing Vector Instruction Set Architectures
نویسندگان
چکیده
Data Level Parallelism(DLP) can be exploited in order to improve the performance of processors for certain workload types. There are two main application fields that rely on DLP, multimedia and scientific computing. Most of the existing multimedia vector extensions use sub-word parallelism and wide data paths for processing independent, mainly integer, values in parallel. On the other hand, classic vector supercomputers rely on efficient processing of large arrays of floating point numbers typically found in scientific applications. In both cases, the selection of an appropriate instruction set architecture(ISA) is crucial in exploiting the potential DLP to gain high performance. The main objective of this thesis is to develop a methodology for synthesizing customized vector ISAs for various application domains targeting high performance program execution. In order to accomplish this objective, a number of applications from the telecommunication and linear algebra domains have been studied, and custom vector instructions sets have been synthesized. Three algorithms that compute the shortest paths in a directed graph (Dijkstra, Floyd and Bellman-Ford) have been analyzed, along with the widely used Linpack floating point benchmark. The framework used to customize the ISAs included the use of the Gnu C Compiler versions 4.1.2 and 2.7.2.3 and the SimpleScalar-3.0d tool set extended to simulate customized vector units. The modifications applied to the simulator include the addition of a vector register file, vector functional units and specific vector instructions. The main results of this thesis can be summarized as follows: overall applications speedups of 24.88X for Dijkstra (after both code optimization and vectorization), 4.99X for Floyd, 9.27X for Bellman-Ford and 4.33X for the C version of Linpack. The above results suggest a consistent improvement in execution times due to the customized vector instruction sets. Abstract Data Level Parallelism(DLP) can be exploited in order to improve the performance of processors for certain workload types. There are two main application fields that rely on DLP, multimedia and scientific computing. Most of the existing multimedia vector extensions use sub-word parallelism and wide data paths for processing independent, mainly integer, values in parallel. On the other hand, classic vector supercomputers rely on efficient processing of large arrays of floating point numbers typically found in scientific applications. In both cases, the selection of an appropriate instruction set architecture(ISA) is crucial in exploiting the potential DLP to gain high performance. The main objective of this thesis is to develop a methodology for synthesizing customized vector …
منابع مشابه
Customization of an embedded RISC CPU with SIMD extensions for video encoding: A case study
This work presents a detailed case study in customizing a configurable, extensible, 32-bit RISC processor with vector/SIMD instruction extensions for the efficient execution of block-based video-coding algorithms utilizing a proprietary co-design environment. In addition to the default Full-Search motion estimation of the MPEG-2 Test Model 5, fourteen fast ME algorithms were implemented in both...
متن کاملCustomizing the Datapath and ISA of Soft VLIW Processors
In this paper, we examine the trade-offs in performance and area due to customizing the datapath and instruction set architecture of a soft VLIW processor implemented in a high-density FPGA. In addition to describing our processor, we describe a number of microarchitectural optimizations we used to reduce the area of the datapath. We also describe the tools we developed to customize, generate, ...
متن کاملA Comparison Between Processor Architectures for Multimedia Applications
The efficient processing of MultiMedia Applications (MMAs) is currently one of the main bottlenecks in the media processing field. Many architectures have been proposed for processing MMAs such as VLIW, superscalar (general-purpose processor enhanced with a multimedia extension such as MMX), vector architectures, SIMD architectures, and reconfigurable computing devices. The question then arises...
متن کاملSimple ASIC Complex ASIC RaPiD FPGA GARP DPGA SuperSpeculative RAW TRACE ( Multiscalar ) SMT VECTOR
Poor scalability of Superscalar architectures with increasing instruction-level parallelism (ilp) has resulted in a trend towards statically scheduled horizontal architectures such as Very Large Instruction Word (vliw) processors and their more sophisticated successors called Explicitly Parallel Instruction Computing (epic) architectures. We extend the epic model with additional capabilities to...
متن کاملFor Embedded Applications with Data-level Parallelism, a Vector Processor Offers High Performance at Low Power Consumption and Low Design Complexity. unlike Superscalar and Vliw Designs, a Vector Processor Is Scalable and Can Optimally Match Specific
Designers of embedded processors have typically optimized for low power consumption and low design complexity to minimize cost. Performance was a secondary consideration. Nowadays, many embedded systems (set-top boxes, game consoles, personal digital assistants, and cell phones) commonly perform computation-intensive media tasks such as video processing, speech transcoding, graphics, and high-b...
متن کامل